Overview
process_historical_market_breadth.py calculates day-by-day market breadth indicators across the entire stock universe, generating a historical time-series dataset for the Market Breadth Dashboard charts.
Pipeline Position: Phase 4 - Historical analytics generationCritical Function: Powers breadth trend charts with 250 days of advance/decline, SMA breadth, and momentum indicators
Purpose
This script:
- Processes 250 trading days of historical OHLCV data for all tracked stocks
- Calculates daily breadth metrics (advances, declines, SMA breadth, etc.)
- Merges stock breadth with major index price data
- Outputs a CSV file in a specific row-based format for dashboard consumption
all_stocks_fundamental_analysis.json
Master stock list to determine which symbols to process
Individual stock OHLCV files with columns: Date, Open, High, Low, Close, Volume
indices_ohlcv_data/NIFTY.csv
Nifty 50 OHLCV data used to establish the master timeline (last 250 trading days)
Index OHLCV files for:
- NIFTY_MIDCAP_150.csv
- NIFTY_SMALLCAP_250.csv
- NIFTY_MIDSMALLCAP_400.csv
- NIFTY_500.csv
Output Files
Row-based CSV with each metric as a row and dates as columnsFormat:Type of Info,2025-05-15,2025-05-16,2025-05-17,...
Up by 4% Today,23,45,12,...
Down by 4% Today,8,15,5,...
5 Day Ratio,1.45,1.52,1.38,...
Above 200MA %,68.5,69.2,70.1,...
Nifty 50,22450.30,22523.15,22601.80,...
Compressed JSON version of the breadth data (currently placeholder in code)
Processing Logic
1. Master Timeline Establishment
Uses Nifty 50’s last 250 trading days as the reference timeline:
LOOKBACK_DAYS = 250
nifty_path = os.path.join(INDEX_OHLCV_DIR, "NIFTY.csv")
nifty_df = pd.read_csv(nifty_path)
timeline = nifty_df['Date'].tail(LOOKBACK_DAYS).tolist()
date_to_idx = {date: i for i, date in enumerate(timeline)}
num_days = len(timeline)
2. Breadth Matrices Initialization
Creates NumPy arrays for efficient metric storage:
# Matrices to store daily flags (Rows=Days, Cols=Stocks)
advances = np.zeros(num_days)
declines = np.zeros(num_days)
above_200ma = np.zeros(num_days)
above_50ma = np.zeros(num_days)
above_20ma = np.zeros(num_days)
above_10ma = np.zeros(num_days)
up_4pc = np.zeros(num_days)
down_4pc = np.zeros(num_days)
high_52w = np.zeros(num_days)
low_52w = np.zeros(num_days)
vol_plus = np.zeros(num_days)
vol_minus = np.zeros(num_days)
3. Stock-Level Processing
For each stock, calculates technical indicators and updates daily counters:
for csv_path in csv_files:
symbol = os.path.basename(csv_path).replace(".csv", "")
if symbol not in valid_symbols: continue
# Re-read full history for technicals to avoid edge effects
full_df = pd.read_csv(csv_path)
full_df['SMA_10'] = full_df['Close'].rolling(10).mean()
full_df['SMA_20'] = full_df['Close'].rolling(20).mean()
full_df['SMA_50'] = full_df['Close'].rolling(50).mean()
full_df['SMA_200'] = full_df['Close'].rolling(200).mean()
full_df['Vol_SMA_20'] = full_df['Volume'].rolling(20).mean()
full_df['H_52W'] = full_df['High'].rolling(252).max()
full_df['L_52W'] = full_df['Low'].rolling(252).min()
full_df['Prev_Close'] = full_df['Close'].shift(1)
full_df['Daily_Ret'] = ((full_df['Close'] - full_df['Prev_Close']) / full_df['Prev_Close']) * 100
# Filter back to timeline
analysis_df = full_df[full_df['Date'].isin(timeline)]
for _, row in analysis_df.iterrows():
idx = date_to_idx.get(row['Date'])
if idx is None: continue
# Metrics Calculation
if row['Close'] > row['Prev_Close']: advances[idx] += 1
if row['Close'] < row['Prev_Close']: declines[idx] += 1
if row['Close'] > row['SMA_200']: above_200ma[idx] += 1
if row['Close'] > row['SMA_50']: above_50ma[idx] += 1
if row['Close'] > row['SMA_20']: above_20ma[idx] += 1
if row['Close'] > row['SMA_10']: above_10ma[idx] += 1
if row['Daily_Ret'] >= 4: up_4pc[idx] += 1
if row['Daily_Ret'] <= -4: down_4pc[idx] += 1
if row['High'] >= row['H_52W']: high_52w[idx] += 1
if row['Low'] <= row['L_52W']: low_52w[idx] += 1
if row['Volume'] > row['Vol_SMA_20']: vol_plus[idx] += 1
else: vol_minus[idx] += 1
4. Advance/Decline Ratio Calculation
Calculates rolling A/D ratios:
def calc_ratio(adv, dec, window):
r = []
for i in range(len(adv)):
start = max(0, i - window + 1)
sum_adv = sum(adv[start:i+1])
sum_dec = sum(dec[start:i+1])
ratio = round(sum_adv / sum_dec, 2) if sum_dec > 0 else 1.0
r.append(ratio)
return r
rows.append(to_csv_row("5 Day Ratio", calc_ratio(advances, declines, 5)))
rows.append(to_csv_row("10 Day Ratio", calc_ratio(advances, declines, 10)))
5. CSV Assembly
Assembles the final CSV in row-based format:
rows = []
rows.append("Type of Info," + ",".join(timeline))
# Momentum Indicators
rows.append(to_csv_row("Up by 4% Today", up_4pc.astype(int)))
rows.append(to_csv_row("Down by 4% Today", down_4pc.astype(int)))
# A/D Ratios
rows.append(to_csv_row("5 Day Ratio", calc_ratio(advances, declines, 5)))
rows.append(to_csv_row("10 Day Ratio", calc_ratio(advances, declines, 10)))
# Breadth Percentages
total_tracked = max(processed_count, 1)
rows.append(to_csv_row("Above 200MA %", np.round(above_200ma / total_tracked * 100, 1)))
rows.append(to_csv_row("Above 50MA %", np.round(above_50ma / total_tracked * 100, 1)))
rows.append(to_csv_row("Above 20MA %", np.round(above_20ma / total_tracked * 100, 1)))
rows.append(to_csv_row("Above 10MA %", np.round(above_10ma / total_tracked * 100, 1)))
# 52-Week Extremes
rows.append(to_csv_row("Reached 52w High", high_52w.astype(int)))
rows.append(to_csv_row("Reached 52w Low", low_52w.astype(int)))
# Volume
rows.append(to_csv_row("Volume greater than 20Day Average", vol_plus.astype(int)))
rows.append(to_csv_row("Volume less than 20Day Average", vol_minus.astype(int)))
# Raw Counts
rows.append(to_csv_row("Advances", advances.astype(int)))
rows.append(to_csv_row("Declines", declines.astype(int)))
# Index Prices
for label, prices in index_data.items():
rows.append(to_csv_row(label, prices))
Output Metrics
Momentum Indicators
Daily count of stocks with +4% or greater return
Daily count of stocks with -4% or worse return
Advance/Decline Ratios
5-day rolling advance/decline ratio
- Values > 1.0 indicate bullish breadth
- Values < 1.0 indicate bearish breadth
10-day rolling advance/decline ratio
Moving Average Breadth
Percentage of stocks trading above their 200-day SMA (daily)
Percentage of stocks trading above their 50-day SMA (daily)
Percentage of stocks trading above their 20-day SMA (daily)
Percentage of stocks trading above their 10-day SMA (daily)
52-Week Extremes
Daily count of stocks hitting new 52-week highs
Daily count of stocks hitting new 52-week lows
Volume Metrics
Volume greater than 20Day Average
Count of stocks with above-average volume
Volume less than 20Day Average
Count of stocks with below-average volume
Index Prices
Daily closing prices for Nifty 50
Daily closing prices for Nifty 500
Daily closing prices for Nifty Midcap 150
Daily closing prices for Nifty Smallcap 250
Daily closing prices for Nifty Midsmallcap 400
Usage Example
python process_historical_market_breadth.py
Expected Output:
⏳ Loading master stock list...
Targeting 2847 stocks for historical breadth.
🧬 Processing stock-level history...
✅ Analyzed 2847 stocks. Merging with Index data...
🚀 Market Breadth Historical Data generated: /path/to/market_breadth.csv
- Uses NumPy arrays for memory efficiency with large datasets
- Processes full history once per stock to calculate technical indicators correctly
- Filters to timeline only for final analysis to reduce computation
- Avoids edge effects by using full historical data for rolling calculations
Data Quality Notes
SMA Edge Effects Prevention: The script reads the full historical CSV for each stock to calculate SMAs properly, then filters to the 250-day timeline. This prevents incorrect SMA values at the beginning of the timeline.
Placeholder Metrics: Some metrics like “Up by 25% in Month” and “Nifty 500 % of W&M RSI > 60” are currently placeholders (zeros) and may be implemented in future versions.